Qual a série mais bem avaliada? A diferença entre as notas IMDB de cada séries é muito grande?

#Pensar se a média é realmente representativa
series_a_serem_analisadas = read_csv(here("data/series_from_imdb.csv"),
                                     progress = FALSE) %>%
                            filter(series_name %in% c("Mad Men", "Sherlock", "The Killing"))
Parsed with column specification:
cols(
  series_name = col_character(),
  episode = col_character(),
  series_ep = col_integer(),
  season = col_integer(),
  season_ep = col_integer(),
  url = col_character(),
  user_rating = col_double(),
  user_votes = col_double(),
  r1 = col_double(),
  r2 = col_double(),
  r3 = col_double(),
  r4 = col_double(),
  r5 = col_double(),
  r6 = col_double(),
  r7 = col_double(),
  r8 = col_double(),
  r9 = col_double(),
  r10 = col_double()
)
medias_imd_por_serie = group_by(series_a_serem_analisadas, series_name) %>%
                       summarize(media_imdb = mean(user_rating))

Calculamos a média IMDB de cada série fazendo uma média das notas, dadas pelos espectadores, de cada episódio. Essa nota, por sua vez, é calculada fazendo-se uma média ponderada das notas, variando de 1 a 10, e a quantidade de pessoas que votaram. Portanto, podemos suspeitar que nossa média IMDB é representativa, ou seja, a nota da maioria das pessoas está em torno dessa média. Dito isto, temos que, dentre as séries escolhidas, a maior nota é a de Sherlock, aproximadamente 8.9, porém as outras não não estão muito longe disso.

medias_series = plot_ly(medias_imd_por_serie,
                        x = ~series_name,
                        y = ~media_imdb,
                        name = "Média IMDB Séries",
                        type = "bar",
                        color = ~series_name) %>%
                        layout(yaxis = list(title = "Média IMDB"),
                               xaxis = list(title = "Séries"),
                               barmode = "group")
medias_series

No entanto, podemos ver que The Killing é a que possui uma distribuição de notas mais homogênea, as pessoas votaram de forma mais parecida, enquanto que a dispersão das notas dos episódios de Mad Men e Sherlock são maiores, tendo uma maior diferença entre os votos de cada pessoa. Sendo Mad Men a que tem uma maior distância entre a menor e maior nota atribuida. Além disso, podemos perceber que a mediana e a média de cada série estão próximas uma da outra, confirmando que a média representa bem o que as pessoas acham dessas três séries.

variacoes_notas = plot_ly(series_a_serem_analisadas,
                          x = ~series_name,
                          y = ~user_rating,
                          type = "box",
                          color = ~series_name) %>%
                          layout(yaxis = list(title = "Média IMDB"),
                                 xaxis = list(title = "Série"))
variacoes_notas

Mas será que as avaliações das séries mudam muito de acordo com a temporada?

No gráfico abaixo, podemos observar dois casos interessantes. O público parece não ter gostado muito da última temporada de Sherlock, pois a avaliação da quarta temporada caiu 0.625 em relação a terceira, e é a nota mais baixa atribuída à série. Já The Killing, por mais estranho que pareça, principalmente para quem viu a nota da série no Rotten Tomatoes, parece agradar cada vez mais ao público, mostrando um gráfico sempre crescente. Com relação a Mad Men, as notas não variam muito sempre maior que 8 e menor que 9. Contudo, vemos que a quinta e sexta temporada não são as favoritas.

media_por_temporada = aggregate(series_a_serem_analisadas$user_rating,
                                by = list(series_name = series_a_serem_analisadas$series_name,
                                          season = series_a_serem_analisadas$season),
                                mean)

colnames(media_por_temporada)[3] <- "season_mean"
media_temporada = plot_ly(media_por_temporada,
                          x = ~season,
                          y = ~season_mean,
                          color = ~series_name,
                          type = "scatter",
                          mode = "lines") %>%
                  layout(yaxis = list(title = "IMDB da Temporada"),
                         xaxis = list(title = "Temporada"))

media_temporada
LS0tCnRpdGxlOiAiTWFkIE1lbiwgU2hlcmxvY2sgZSBUaGUgS2lsbGluZyBTZWd1bmRvIG8gSU1EQiIKYXV0aG9yOiAiQ2xhcmEgTW9yYWVzIERhbnRhcyIKZGF0ZTogIjIxIGRlIEFicmlsIGRlIDIwMTgiCm91dHB1dDoKICBodG1sX25vdGVib29rOgogICAgdG9jOiB5ZXMKICAgIHRvY19mbG9hdDogeWVzCiAgaHRtbF9kb2N1bWVudDoKICAgIGRmX3ByaW50OiBwYWdlZAogICAgdG9jOiB5ZXMKICAgIHRvY19mbG9hdDogeWVzCi0tLQoKYGBge3Igc2V0dXAsIGVjaG89RkFMU0UsIHdhcm5pbmc9RkFMU0UsIG1lc3NhZ2U9RkFMU0V9CmxpYnJhcnkodGlkeXZlcnNlKQpsaWJyYXJ5KGhlcmUpCmxpYnJhcnkoZ2dwbG90MikKbGlicmFyeShwbG90bHkpCmBgYAojIyNRdWFsIGEgc8OpcmllIG1haXMgYmVtIGF2YWxpYWRhPyBBIGRpZmVyZW7Dp2EgZW50cmUgYXMgbm90YXMgSU1EQiBkZSBjYWRhIHPDqXJpZXMgw6kgbXVpdG8gZ3JhbmRlPwoKYGBge3J9CiNQZW5zYXIgc2UgYSBtw6lkaWEgw6kgcmVhbG1lbnRlIHJlcHJlc2VudGF0aXZhCnNlcmllc19hX3NlcmVtX2FuYWxpc2FkYXMgPSByZWFkX2NzdihoZXJlKCJkYXRhL3Nlcmllc19mcm9tX2ltZGIuY3N2IiksCiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICBwcm9ncmVzcyA9IEZBTFNFKSAlPiUKICAgICAgICAgICAgICAgICAgICAgICAgICAgIGZpbHRlcihzZXJpZXNfbmFtZSAlaW4lIGMoIk1hZCBNZW4iLCAiU2hlcmxvY2siLCAiVGhlIEtpbGxpbmciKSkKCm1lZGlhc19pbWRfcG9yX3NlcmllID0gZ3JvdXBfYnkoc2VyaWVzX2Ffc2VyZW1fYW5hbGlzYWRhcywgc2VyaWVzX25hbWUpICU+JQogICAgICAgICAgICAgICAgICAgICAgIHN1bW1hcml6ZShtZWRpYV9pbWRiID0gbWVhbih1c2VyX3JhdGluZykpCmBgYAoKQ2FsY3VsYW1vcyBhIG3DqWRpYSBJTURCIGRlIGNhZGEgc8OpcmllIGZhemVuZG8gdW1hIG3DqWRpYSBkYXMgbm90YXMsIGRhZGFzIHBlbG9zIGVzcGVjdGFkb3JlcywgZGUgY2FkYSBlcGlzw7NkaW8uIEVzc2Egbm90YSwgcG9yIHN1YSB2ZXosIMOpIGNhbGN1bGFkYSBmYXplbmRvLXNlIHVtYSBtw6lkaWEgcG9uZGVyYWRhIGRhcyBub3RhcywgdmFyaWFuZG8gZGUgMSBhIDEwLCBlIGEgcXVhbnRpZGFkZSBkZSBwZXNzb2FzIHF1ZSB2b3RhcmFtLiBQb3J0YW50bywgcG9kZW1vcyBzdXNwZWl0YXIgcXVlIG5vc3NhIG3DqWRpYSBJTURCIMOpIHJlcHJlc2VudGF0aXZhLCBvdSBzZWphLCBhIG5vdGEgZGEgbWFpb3JpYSBkYXMgcGVzc29hcyBlc3TDoSBlbSB0b3JubyBkZXNzYSBtw6lkaWEuCkRpdG8gaXN0bywgdGVtb3MgcXVlLCBkZW50cmUgYXMgc8OpcmllcyBlc2NvbGhpZGFzLCBhIG1haW9yIG5vdGEgw6kgYSBkZSBTaGVybG9jaywgYXByb3hpbWFkYW1lbnRlIDguOSwgcG9yw6ltIGFzIG91dHJhcyBuw6NvIG7Do28gZXN0w6NvIG11aXRvIGxvbmdlIGRpc3NvLgoKYGBge3J9Cm1lZGlhc19zZXJpZXMgPSBwbG90X2x5KG1lZGlhc19pbWRfcG9yX3NlcmllLAogICAgICAgICAgICAgICAgICAgICAgICB4ID0gfnNlcmllc19uYW1lLAogICAgICAgICAgICAgICAgICAgICAgICB5ID0gfm1lZGlhX2ltZGIsCiAgICAgICAgICAgICAgICAgICAgICAgIG5hbWUgPSAiTcOpZGlhIElNREIgU8OpcmllcyIsCiAgICAgICAgICAgICAgICAgICAgICAgIHR5cGUgPSAiYmFyIiwKICAgICAgICAgICAgICAgICAgICAgICAgY29sb3IgPSB+c2VyaWVzX25hbWUpICU+JQogICAgICAgICAgICAgICAgICAgICAgICBsYXlvdXQoeWF4aXMgPSBsaXN0KHRpdGxlID0gIk3DqWRpYSBJTURCIiksCiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICB4YXhpcyA9IGxpc3QodGl0bGUgPSAiU8OpcmllcyIpLAogICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgYmFybW9kZSA9ICJncm91cCIpCm1lZGlhc19zZXJpZXMKYGBgCgpObyBlbnRhbnRvLCBwb2RlbW9zIHZlciBxdWUgVGhlIEtpbGxpbmcgw6kgYSBxdWUgcG9zc3VpIHVtYSBkaXN0cmlidWnDp8OjbyBkZSBub3RhcyBtYWlzIGhvbW9nw6puZWEsIGFzIHBlc3NvYXMgdm90YXJhbSBkZSBmb3JtYSBtYWlzIHBhcmVjaWRhLCBlbnF1YW50byBxdWUgYSBkaXNwZXJzw6NvIGRhcyBub3RhcyBkb3MgZXBpc8OzZGlvcyBkZSBNYWQgTWVuIGUgU2hlcmxvY2sgc8OjbyBtYWlvcmVzLCB0ZW5kbyB1bWEgbWFpb3IgZGlmZXJlbsOnYSBlbnRyZSBvcyB2b3RvcyBkZSBjYWRhIHBlc3NvYS4gU2VuZG8gTWFkIE1lbiBhIHF1ZSB0ZW0gdW1hIG1haW9yIGRpc3TDom5jaWEgZW50cmUgYSBtZW5vciBlIG1haW9yIG5vdGEgYXRyaWJ1aWRhLiBBbMOpbSBkaXNzbywgcG9kZW1vcyBwZXJjZWJlciBxdWUgYSBtZWRpYW5hIGUgYSBtw6lkaWEgZGUgY2FkYSBzw6lyaWUgZXN0w6NvIHByw7N4aW1hcyB1bWEgZGEgb3V0cmEsIGNvbmZpcm1hbmRvIHF1ZSBhIG3DqWRpYSByZXByZXNlbnRhIGJlbSBvIHF1ZSBhcyBwZXNzb2FzIGFjaGFtIGRlc3NhcyB0csOqcyBzw6lyaWVzLgoKYGBge3J9CnZhcmlhY29lc19ub3RhcyA9IHBsb3RfbHkoc2VyaWVzX2Ffc2VyZW1fYW5hbGlzYWRhcywKICAgICAgICAgICAgICAgICAgICAgICAgICB4ID0gfnNlcmllc19uYW1lLAogICAgICAgICAgICAgICAgICAgICAgICAgIHkgPSB+dXNlcl9yYXRpbmcsCiAgICAgICAgICAgICAgICAgICAgICAgICAgdHlwZSA9ICJib3giLAogICAgICAgICAgICAgICAgICAgICAgICAgIGNvbG9yID0gfnNlcmllc19uYW1lKSAlPiUKICAgICAgICAgICAgICAgICAgICAgICAgICBsYXlvdXQoeWF4aXMgPSBsaXN0KHRpdGxlID0gIk3DqWRpYSBJTURCIiksCiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIHhheGlzID0gbGlzdCh0aXRsZSA9ICJTw6lyaWUiKSkKdmFyaWFjb2VzX25vdGFzCmBgYAoKIyMjTWFzIHNlcsOhIHF1ZSBhcyBhdmFsaWHDp8O1ZXMgZGFzIHPDqXJpZXMgbXVkYW0gbXVpdG8gZGUgYWNvcmRvIGNvbSBhIHRlbXBvcmFkYT8KCk5vIGdyw6FmaWNvIGFiYWl4bywgcG9kZW1vcyBvYnNlcnZhciBkb2lzIGNhc29zIGludGVyZXNzYW50ZXMuIE8gcMO6YmxpY28gcGFyZWNlIG7Do28gdGVyIGdvc3RhZG8gbXVpdG8gZGEgw7psdGltYSB0ZW1wb3JhZGEgZGUgU2hlcmxvY2ssIHBvaXMgYSBhdmFsaWHDp8OjbyBkYSBxdWFydGEgdGVtcG9yYWRhIGNhaXUgMC42MjUgZW0gcmVsYcOnw6NvIGEgdGVyY2VpcmEsIGUgw6kgYSBub3RhIG1haXMgYmFpeGEgYXRyaWJ1w61kYSDDoCBzw6lyaWUuIErDoSBUaGUgS2lsbGluZywgcG9yIG1haXMgZXN0cmFuaG8gcXVlIHBhcmXDp2EsIHByaW5jaXBhbG1lbnRlIHBhcmEgcXVlbSB2aXUgYSBub3RhIGRhIHPDqXJpZSBubyBSb3R0ZW4gVG9tYXRvZXMsIHBhcmVjZSBhZ3JhZGFyIGNhZGEgdmV6IG1haXMgYW8gcMO6YmxpY28sIG1vc3RyYW5kbyB1bSBncsOhZmljbyBzZW1wcmUgY3Jlc2NlbnRlLiBDb20gcmVsYcOnw6NvIGEgTWFkIE1lbiwgYXMgbm90YXMgbsOjbyB2YXJpYW0gbXVpdG8gc2VtcHJlIG1haW9yIHF1ZSA4IGUgbWVub3IgcXVlIDkuIENvbnR1ZG8sIHZlbW9zIHF1ZSBhIHF1aW50YSBlIHNleHRhIHRlbXBvcmFkYSBuw6NvIHPDo28gYXMgZmF2b3JpdGFzLgoKYGBge3J9Cm1lZGlhX3Bvcl90ZW1wb3JhZGEgPSBhZ2dyZWdhdGUoc2VyaWVzX2Ffc2VyZW1fYW5hbGlzYWRhcyR1c2VyX3JhdGluZywKICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICBieSA9IGxpc3Qoc2VyaWVzX25hbWUgPSBzZXJpZXNfYV9zZXJlbV9hbmFsaXNhZGFzJHNlcmllc19uYW1lLAogICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICBzZWFzb24gPSBzZXJpZXNfYV9zZXJlbV9hbmFsaXNhZGFzJHNlYXNvbiksCiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgbWVhbikKCmNvbG5hbWVzKG1lZGlhX3Bvcl90ZW1wb3JhZGEpWzNdIDwtICJzZWFzb25fbWVhbiIKYGBgCgpgYGB7cn0KbWVkaWFfdGVtcG9yYWRhID0gcGxvdF9seShtZWRpYV9wb3JfdGVtcG9yYWRhLAogICAgICAgICAgICAgICAgICAgICAgICAgIHggPSB+c2Vhc29uLAogICAgICAgICAgICAgICAgICAgICAgICAgIHkgPSB+c2Vhc29uX21lYW4sCiAgICAgICAgICAgICAgICAgICAgICAgICAgY29sb3IgPSB+c2VyaWVzX25hbWUsCiAgICAgICAgICAgICAgICAgICAgICAgICAgdHlwZSA9ICJzY2F0dGVyIiwKICAgICAgICAgICAgICAgICAgICAgICAgICBtb2RlID0gImxpbmVzIikgJT4lCiAgICAgICAgICAgICAgICAgIGxheW91dCh5YXhpcyA9IGxpc3QodGl0bGUgPSAiSU1EQiBkYSBUZW1wb3JhZGEiKSwKICAgICAgICAgICAgICAgICAgICAgICAgIHhheGlzID0gbGlzdCh0aXRsZSA9ICJUZW1wb3JhZGEiKSkKCm1lZGlhX3RlbXBvcmFkYQpgYGAKCgo=